{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recommending Movies\n",
    "\n",
    "The [MovieLens 20M](http://files.grouplens.org/datasets/movielens/ml-20m-README.html) dataset contains 20 million user ratings from 1 to 5 of thousands of movies. In this demo we'll build a simple recommendation system which will use this data to suggest 25 movies based on a seed movie you provide."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The notebook cells below use `pymldb`'s `Connection` class to make [REST API](../../../../doc/#builtin/WorkingWithRest.md.html) calls. You can check out the [Using `pymldb` Tutorial](../../../../doc/nblink.html#_tutorials/Using pymldb Tutorial) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from pymldb import Connection\n",
    "mldb = Connection()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the MovieLens 20M data\n",
    "\n",
    "We'll start by using some command-line tools to download and decompress the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Archive:  /mldb_data/data/ml-20m.zip\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "replace /mldb_data/data/ml-20m/links.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename:  NULL\n",
      "(EOF or read error, treating as \"[N]one\" ...)\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "mkdir -p /mldb_data/data\n",
    "curl \"http://public.mldb.ai/ml-20m.zip\" 2>/dev/null  > /mldb_data/data/ml-20m.zip\n",
    "unzip /mldb_data/data/ml-20m.zip -d /mldb_data/data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Summary\n",
      "=======\n",
      "\n",
      "This dataset (ml-20m) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 20000263 ratings and 465564 tag applications across 27278 movies. These data were created by 138493 users between January 09, 1995 and March 31, 2015. This dataset was generated on March 31, 2015.\n",
      "\n",
      "Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.\n",
      "\n",
      "The data are contained in four files, `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows.\n",
      "\n",
      "This and other GroupLens data sets are publicly available for download at <http://grouplens.org/datasets/>.\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "head /mldb_data/data/ml-20m/README.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "userId,movieId,rating,timestamp\r\n",
      "1,2,3.5,1112486027\r\n",
      "1,29,3.5,1112484676\r\n",
      "1,32,3.5,1112484819\r\n",
      "1,47,3.5,1112484727\r\n",
      "1,50,3.5,1112484580\r\n",
      "1,112,3.5,1094785740\r\n",
      "1,151,4.0,1094785734\r\n",
      "1,223,4.0,1112485573\r\n",
      "1,253,4.0,1112484940\r\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "head /mldb_data/data/ml-20m/ratings.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the data into MLDB\n",
    "\n",
    "See the [Loading Data Tutorial](../../../../doc/nblink.html#_tutorials/Loading Data Tutorial) guide for more details on how to get data into MLDB. \n",
    "\n",
    "Here we load a text file and use the `pivot` aggregator to create a sparse matrix representation of the ratings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n",
      "<Response [201]>\n",
      "CPU times: user 18.4 ms, sys: 4.22 ms, total: 22.7 ms\n",
      "Wall time: 1min 19s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "print mldb.put('/v1/procedures/import_mvlns', {\n",
    "    \"type\": \"import.text\", \n",
    "    \"params\": {\n",
    "        \"dataFileUrl\":\"file:///mldb_data/data/ml-20m/ratings.csv\",\n",
    "        \"outputDataset\": \"mvlns_ratings_csv\",\n",
    "        \"runOnCreation\": True\n",
    "    }\n",
    "})\n",
    "\n",
    "print mldb.put('/v1/procedures/process_mvlns', {\n",
    "    \"type\": \"transform\",\n",
    "    \"params\": {\n",
    "        \"inputData\": \"\"\"\n",
    "            select pivot(movieId, rating) as *\n",
    "            named userId \n",
    "            from mvlns_ratings_csv\n",
    "            group by userId\n",
    "            \"\"\",\n",
    "        \"outputDataset\": \"mvlns_ratings\",\n",
    "        \"runOnCreation\": True\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Take a peek at the dataset\n",
    "\n",
    "We'll use the [Query API](../../../../doc/#builtin/sql/QueryAPI.md.html). Each row is a user, each column is a movie, and the cell value is the rating the user gave the movie."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>21</th>\n",
       "      <th>58</th>\n",
       "      <th>69</th>\n",
       "      <th>163</th>\n",
       "      <th>246</th>\n",
       "      <th>357</th>\n",
       "      <th>441</th>\n",
       "      <th>471</th>\n",
       "      <th>858</th>\n",
       "      <th>908</th>\n",
       "      <th>...</th>\n",
       "      <th>3255</th>\n",
       "      <th>3260</th>\n",
       "      <th>3386</th>\n",
       "      <th>3418</th>\n",
       "      <th>3425</th>\n",
       "      <th>3654</th>\n",
       "      <th>3713</th>\n",
       "      <th>3865</th>\n",
       "      <th>3879</th>\n",
       "      <th>3897</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>138492</th>\n",
       "      <td>3</td>\n",
       "      <td>4.5</td>\n",
       "      <td>4.5</td>\n",
       "      <td>3.5</td>\n",
       "      <td>4</td>\n",
       "      <td>4.5</td>\n",
       "      <td>5</td>\n",
       "      <td>4.5</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138491</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138490</th>\n",
       "      <td>3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 250 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          21   58   69  163  246  357  441  471  858  908  ...   3255  3260  \\\n",
       "_rowName                                                   ...                \n",
       "138492     3  4.5  4.5  3.5    4  4.5    5  4.5    5    4  ...    NaN   NaN   \n",
       "138491   NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   \n",
       "138490     3  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  ...      4     4   \n",
       "\n",
       "          3386  3418  3425  3654  3713  3865  3879  3897  \n",
       "_rowName                                                  \n",
       "138492     NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "138491     NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "138490       3     4     2     4     4     1     3     4  \n",
       "\n",
       "[3 rows x 250 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"select * from mvlns_ratings limit 3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Singular Value Decomposition (SVD)\n",
    "\n",
    "We will create and run a [Procedure](../../../../doc/#builtin/procedures/Procedures.md.html) of type [`svd.train`](../../../../doc/#builtin/procedures/Svd.md.html). This creates an `embedding` dataset where each row is a movie and the columns represent coordinates in a 100-dimensional space. Similar movies end up closer to each other than dissimilar movies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "print mldb.put('/v1/procedures/mvlns_svd', {\n",
    "    \"type\" : \"svd.train\",\n",
    "    \"params\" : {\n",
    "        \"trainingData\" : \"select COLUMN EXPR (where rowCount() > 3) from mvlns_ratings\",\n",
    "        \"columnOutputDataset\" : \"mvlns_svd_embedding\",\n",
    "        \"modelFileUrl\": \"file://models/mvlns.svd\",\n",
    "        \"functionName\": \"mvlns_svd_embedder\",\n",
    "        \"runOnCreation\": True\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Explore the results!\n",
    "\n",
    "Our dataset has `movieId`s but humans think about movie names so we'll load up the movie names in a dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "from ipywidgets import interact, interact_manual\n",
    "from uuid import uuid4\n",
    "\n",
    "print mldb.put('/v1/procedures/import_movies', {\n",
    "    \"type\": \"import.text\", \n",
    "    \"params\": {\n",
    "        \"dataFileUrl\":\"file:///mldb_data/data/ml-20m/movies.csv\",\n",
    "        \"outputDataset\": \"movies\",\n",
    "        \"select\": \"title, movieId\",\n",
    "        \"named\": \"movieId\",\n",
    "        \"runOnCreation\": True\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A simple search function to find all movies (and corresponding `movieId`s) whose names contain a string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>115875</th>\n",
       "      <td>Toy Story Toons: Hawaiian Vacation (2011)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3114</th>\n",
       "      <td>Toy Story 2 (1999)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120474</th>\n",
       "      <td>Toy Story That Time Forgot (2014)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106022</th>\n",
       "      <td>Toy Story of Terror (2013)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78499</th>\n",
       "      <td>Toy Story 3 (2010)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Toy Story (1995)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120468</th>\n",
       "      <td>Toy Story Toons: Partysaurus Rex (2012)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115879</th>\n",
       "      <td>Toy Story Toons: Small Fry (2011)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              title\n",
       "_rowName                                           \n",
       "115875    Toy Story Toons: Hawaiian Vacation (2011)\n",
       "3114                             Toy Story 2 (1999)\n",
       "120474            Toy Story That Time Forgot (2014)\n",
       "106022                   Toy Story of Terror (2013)\n",
       "78499                            Toy Story 3 (2010)\n",
       "1                                  Toy Story (1995)\n",
       "120468      Toy Story Toons: Partysaurus Rex (2012)\n",
       "115879            Toy Story Toons: Small Fry (2011)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "@interact\n",
    "def movie_search(x = \"toy story\"):\n",
    "    return mldb.query(\"select title from movies where regex_match(lower(title), '.*%s.*')\" % x.strip().lower())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's create a dataset to hold user preferences, and a simple function to simulate a user rating movies they like and movies they dislike, based on the `movie_search` function above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n",
      "<Response [201]>\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>589</th>\n",
       "      <th>1240</th>\n",
       "      <th>3114</th>\n",
       "      <th>4934</th>\n",
       "      <th>6537</th>\n",
       "      <th>68791</th>\n",
       "      <th>78499</th>\n",
       "      <th>102425</th>\n",
       "      <th>106022</th>\n",
       "      <th>115875</th>\n",
       "      <th>115879</th>\n",
       "      <th>120468</th>\n",
       "      <th>120474</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>result</th>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1  589  1240  3114  4934  6537  68791  78499  102425  106022  \\\n",
       "_rowName                                                                 \n",
       "result    5    1     1     5     1     1      1      5       1       5   \n",
       "\n",
       "          115875  115879  120468  120474  \n",
       "_rowName                                  \n",
       "result         5       5       5       5  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print mldb.put(\"/v1/datasets/mvlns_user_prefs\", {\"type\": \"sparse.mutable\"})\n",
    "\n",
    "print mldb.put(\"/v1/functions/preferences\", {\n",
    "    \"type\": \"sql.query\",\n",
    "    \"params\": {\n",
    "        \"query\": \"select {*} as p from mvlns_user_prefs where rowName()=$user\"\n",
    "    }\n",
    "})\n",
    "\n",
    "def save_prefs(user_id, likes, dislikes):\n",
    "    for rating, search_terms in zip([5,1],[likes, dislikes]):\n",
    "        for x in search_terms.split(\",\"):\n",
    "            if len(x) > 3:\n",
    "                mldb.post(\"/v1/datasets/mvlns_user_prefs/rows\", {\n",
    "                     \"rowName\":user_id, \n",
    "                     \"columns\": [[str(m), rating, 0] for m in movie_search(x).index]\n",
    "                })\n",
    "    mldb.post(\"/v1/datasets/mvlns_user_prefs/commit\", {})\n",
    "\n",
    "save_prefs(\"janedoe\", \"Toy Story\", \"Terminator\")\n",
    "mldb.query(\"select preferences({ user: 'janedoe' })[p] as *\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With all that done, we can now build a recommendation engine out of a simple SQL query by mapping a user's preferences into the same space as the movie embeddings (i.e. embedding the user's preferences) and looking for the nearest movies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<Response [201]>\n",
      "<Response [201]>\n"
     ]
    }
   ],
   "source": [
    "print mldb.put(\"/v1/functions/nearest_movies\", {\n",
    "    \"type\": \"embedding.neighbors\",\n",
    "    \"params\": {\n",
    "        \"dataset\": \"mvlns_svd_embedding\",\n",
    "        \"defaultNumNeighbors\": 25,\n",
    "        \"columnName\": \"embedding\"\n",
    "    }\n",
    "})\n",
    "\n",
    "print mldb.put(\"/v1/functions/recommendations\", {\n",
    "    \"type\": \"sql.query\",\n",
    "    \"params\": {\n",
    "        \"query\": \"\"\"\n",
    "            select nearest_movies({ \n",
    "                    coords: mvlns_svd_embedder({\n",
    "                        row: preferences({ user: $user })[p]\n",
    "                    })[embedding] \n",
    "                })[distances] as r\n",
    "        \"\"\"\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a simple function which lets you simulate the results of liking and disliking certain movies and getting back the resulting recommendations. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>m.title</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>316</th>\n",
       "      <td>Stargate (1994)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1682</th>\n",
       "      <td>Truman Show, The (1998)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2617</th>\n",
       "      <td>Mummy, The (1999)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1222</th>\n",
       "      <td>Full Metal Jacket (1987)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2355</th>\n",
       "      <td>Bug's Life, A (1998)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6365</th>\n",
       "      <td>Matrix Reloaded, The (2003)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3527</th>\n",
       "      <td>Predator (1987)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2329</th>\n",
       "      <td>American History X (1998)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>292</th>\n",
       "      <td>Outbreak (1995)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>Bad Boys (1995)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4886</th>\n",
       "      <td>Monsters, Inc. (2001)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>Lethal Weapon (1987)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1784</th>\n",
       "      <td>As Good as It Gets (1997)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>434</th>\n",
       "      <td>Cliffhanger (1993)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1961</th>\n",
       "      <td>Rain Man (1988)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>349</th>\n",
       "      <td>Clear and Present Danger (1994)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2115</th>\n",
       "      <td>Indiana Jones and the Temple of Doom (1984)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5445</th>\n",
       "      <td>Minority Report (2002)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>594</th>\n",
       "      <td>Snow White and the Seven Dwarfs (1937)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1641</th>\n",
       "      <td>Full Monty, The (1997)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2003</th>\n",
       "      <td>Gremlins (1984)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              m.title\n",
       "_rowName                                             \n",
       "316                                   Stargate (1994)\n",
       "1682                          Truman Show, The (1998)\n",
       "2617                                Mummy, The (1999)\n",
       "1222                         Full Metal Jacket (1987)\n",
       "2355                             Bug's Life, A (1998)\n",
       "6365                      Matrix Reloaded, The (2003)\n",
       "3527                                  Predator (1987)\n",
       "2329                        American History X (1998)\n",
       "292                                   Outbreak (1995)\n",
       "145                                   Bad Boys (1995)\n",
       "4886                            Monsters, Inc. (2001)\n",
       "2000                             Lethal Weapon (1987)\n",
       "1784                        As Good as It Gets (1997)\n",
       "434                                Cliffhanger (1993)\n",
       "1961                                  Rain Man (1988)\n",
       "349                   Clear and Present Danger (1994)\n",
       "2115      Indiana Jones and the Temple of Doom (1984)\n",
       "5445                           Minority Report (2002)\n",
       "594            Snow White and the Seven Dwarfs (1937)\n",
       "1641                           Full Monty, The (1997)\n",
       "2003                                  Gremlins (1984)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def recommend(likes=\"Toy Story, Terminator\", dislikes=\"Star Trek\"):\n",
    "    \n",
    "    # here we simulate a new user saving these preferences\n",
    "    user_id = str(uuid4())\n",
    "    save_prefs(user_id, likes, dislikes)\n",
    "    \n",
    "    # we can then run an SQL query to:\n",
    "    #   - retrieve recommendations\n",
    "    #   - transpose and join them to movies to get titles\n",
    "    #   - exclude the already-rated movies from the result\n",
    "    return mldb.query(\"\"\"\n",
    "        select m.title \n",
    "        named m.movieId\n",
    "        from \n",
    "            transpose(( select recommendations({ user: '%(user)s' }) )) as r \n",
    "            join movies as m on r.rowPathElement(2) = m.rowPathElement(0)\n",
    "        where m.movieId not in (keys of preferences({ user: '%(user)s' })[p])\n",
    "        order by r.result\n",
    "        \"\"\" % dict(user=user_id))\n",
    "\n",
    "recommend(likes=\"Toy Story, Terminator\", dislikes=\"Star Trek\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an interactive form that lets you play with this function to see if you agree with the recommendations!\n",
    "\n",
    "NOTE: the interactive part of this demo only works if you're running this Notebook live, not if you're looking at a static copy on http://docs.mldb.ai. See the documentation for [Running MLDB](../../../../doc/#builtin/Running.md.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>m.title</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>316</th>\n",
       "      <td>Stargate (1994)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1682</th>\n",
       "      <td>Truman Show, The (1998)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2617</th>\n",
       "      <td>Mummy, The (1999)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1222</th>\n",
       "      <td>Full Metal Jacket (1987)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2355</th>\n",
       "      <td>Bug's Life, A (1998)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6365</th>\n",
       "      <td>Matrix Reloaded, The (2003)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3527</th>\n",
       "      <td>Predator (1987)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2329</th>\n",
       "      <td>American History X (1998)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>292</th>\n",
       "      <td>Outbreak (1995)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>145</th>\n",
       "      <td>Bad Boys (1995)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4886</th>\n",
       "      <td>Monsters, Inc. (2001)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000</th>\n",
       "      <td>Lethal Weapon (1987)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1784</th>\n",
       "      <td>As Good as It Gets (1997)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>434</th>\n",
       "      <td>Cliffhanger (1993)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1961</th>\n",
       "      <td>Rain Man (1988)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>349</th>\n",
       "      <td>Clear and Present Danger (1994)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2115</th>\n",
       "      <td>Indiana Jones and the Temple of Doom (1984)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5445</th>\n",
       "      <td>Minority Report (2002)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>594</th>\n",
       "      <td>Snow White and the Seven Dwarfs (1937)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1641</th>\n",
       "      <td>Full Monty, The (1997)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2003</th>\n",
       "      <td>Gremlins (1984)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                              m.title\n",
       "_rowName                                             \n",
       "316                                   Stargate (1994)\n",
       "1682                          Truman Show, The (1998)\n",
       "2617                                Mummy, The (1999)\n",
       "1222                         Full Metal Jacket (1987)\n",
       "2355                             Bug's Life, A (1998)\n",
       "6365                      Matrix Reloaded, The (2003)\n",
       "3527                                  Predator (1987)\n",
       "2329                        American History X (1998)\n",
       "292                                   Outbreak (1995)\n",
       "145                                   Bad Boys (1995)\n",
       "4886                            Monsters, Inc. (2001)\n",
       "2000                             Lethal Weapon (1987)\n",
       "1784                        As Good as It Gets (1997)\n",
       "434                                Cliffhanger (1993)\n",
       "1961                                  Rain Man (1988)\n",
       "349                   Clear and Present Danger (1994)\n",
       "2115      Indiana Jones and the Temple of Doom (1984)\n",
       "5445                           Minority Report (2002)\n",
       "594            Snow White and the Seven Dwarfs (1937)\n",
       "1641                           Full Monty, The (1997)\n",
       "2003                                  Gremlins (1984)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "interact_manual(recommend)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Where to next?\n",
    "\n",
    "Check out the other [Tutorials and Demos](../../../../doc/#builtin/Demos.md.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
